Nonlinear random matrix theory for deep learning Nonlinear random matrix theory for deep learning
نویسندگان
چکیده
Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise non-linearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix Y Y , Y = f(WX), where W is a random weight matrix, X is a random data matrix, and f is a point-wise non-linear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature networks on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.
منابع مشابه
Nonlinear random matrix theory for deep learning
Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obst...
متن کاملDeep Semi-Random Features for Nonlinear Function Approximation
We propose semi-random features for nonlinear function approximation. The flexibility of semirandom feature lies between the fully adjustable units in deep learning and the random features used in kernel methods. For one hidden layer models with semi-random features, we prove with no unrealistic assumptions that the model classes contain an arbitrarily good function as the width increases (univ...
متن کاملExact solutions to the nonlinear dynamics of learning in deep linear neural networks
Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output ma...
متن کاملLearning Deep Representations By Distributed Random Samplings
In this paper, we propose an extremely simple deep model for the unsupervised nonlinear dimensionality reduction – deep distributed random samplings. First, its network structure is novel: each layer of the network is a group of mutually independent k-centers clusterings. Second, its learning method is extremely simple: the k centers of each clustering are only k randomly selected examples from...
متن کاملStructured Image Classification from Conditional Random Field with Deep Class Embedding
This paper presents a novel deep learning architecture to classify structured objects in datasets with a large number of visually similar categories. Our model extends the CRF objective function to a nonlinear form, by factorizing the pairwise potential matrix, to learn neighboring-class embedding. The embedding and the classifier are jointly trained to optimize this highly nonlinear CRF object...
متن کامل